Automating insect monitoring using unsupervised near-infrared sensors

Insect monitoring is critical to improve our understanding and ability to preserve and restore biodiversity, sustainably produce crops, and reduce vectors of human and livestock disease. Conventional monitoring methods of trapping and identification are time consuming and thus expensive. Automation would significantly improve the state of the art. Here, we present a network of distributed wireless sensors that moves the field towards automation by recording backscattered near-infrared modulation signatures from insects. The instrument is a compact sensor based on dual-wavelength infrared light emitting diodes and is capable of unsupervised, autonomous long-term insect monitoring over weather and seasons. The sensor records the backscattered light at kHz pace from each insect transiting the measurement volume. Insect observations are automatically extracted and transmitted with environmental metadata over cellular connection to a cloud-based database. The recorded features include wing beat harmonics, melanisation and flight direction. To validate the sensor’s capabilities, we tested the correlation between daily insect counts from an oil seed rape field measured with six yellow water traps and six sensors during a 4-week period. A comparison of the methods found a Spearman’s rank correlation coefficient of 0.61 and a p-value = 0.0065, with the sensors recording approximately 19 times more insect observations and demonstrating a larger temporal dynamic than conventional yellow water trap monitoring.

Insecta is the most speciose class of terrestrial fauna 1 and the majority of the world's biodiversity is composed of this class 2 . In epidemiological and agricultural ecosystems, insects serve as both beneficial organisms [3][4][5] and economic pests 6,7 . Data on insects can support biodiversity conservation 8,9 , human health protection 10 and increased food production 11 .
Insects are monitored via established sampling methods including trapping, sweep netting, and portable aspiration [12][13][14] . These methods are imperfect resulting in biases towards size [15][16][17] and stage 18 . Additionally, conventional methods may be time-consuming, costly and prone to human error such as person-to-person variation in sampling execution [19][20][21] . New methods, like insect anesthetization sampling 22 , are being implemented to minimize these biases. Regardless of sampling method, insect identification is time consuming and requires specialized training.
In order to reduce the cost of insect monitoring and identification, automation of insect trapping [23][24][25][26][27] and identification [27][28][29][30][31] has been developed. While these methods could greatly improve monitoring via traps, they are unsuitable for monitoring a general insect population since trap designs and baits are generally biased in regard to species 32,33 .
Automation of insect monitoring without traps could reduce species bias of conventional methods and human error, thus greatly improving the state of the art. Insect identification has been automated as early as 1973 using wingbeat frequency [34][35][36] , and today remote insect sensing includes acoustic detection 37 , radar observations [38][39][40] and lidar [41][42][43] . Acoustic methods work best with a solid medium 26,44 , though acoustic monitoring of free flying insects has been demonstrated [45][46][47] . While radar technologies have much larger monitoring range 16 www.nature.com/scientificreports/ they are unsuitable for monitoring small insects, or insects around vegetation, such as a crop canopy. Optical methods were early used as to overcome many of these limitations [51][52][53] . Today, lidar can be used to record a large number of observations in a long transect [54][55][56][57][58] and distinguish between species groups by wingbeat frequency (WBF) 55,59 . However, lidar equipment requires a trained operator and requires constant supervision due to eye safety restrictions.
Here we present an autonomous near-infrared sensor for monitoring of flying insects in the field. The sensor aims to minimize human biases, be usable by non-technical personnel, and be capable of unsupervised longterm monitoring. Compared to existing entomological lidars, it has a smaller measurement volume but is eye safe and weatherproof.

Instrument design
The sensor is weatherproof, compact, and intended for field use by non-technicians. Like entomological lidar instrumentation, an air volume is illuminated, and light backscattered from insects entering the measurement volume is recorded by a high-speed photodetector. In addition, the instrument is equipped with a satellite navigation device, a camera for situational photos, and an environmental sensor monitoring temperature, humidity, and light intensity. An internal Global System for Mobile Communications (GSM) modem allows for communication and data transfer. The sensor can be powered by any 12 V power supply, including utility power, batteries, or solar power, and has a maximum power consumption of 30 W during monitoring. A photo of the sensor is shown in Fig. 1 and an internal block diagram is described in Fig. 2. Emitter. The emitter module consists of a rectangular array of LEDs emitting two spectral bands at 808 nm and 980 nm with total output of 1.6 W and 1.7 W, respectively. The two wavelengths are modulated in a square wave at 118.8 kHz and 79.2 kHz respectively. The LEDs are mounted in a checkerboard pattern to achieve a homogeneous beam profile. The total area of the checkerboard, and thus the beam size at the source, is 82 cm 2 . The light emitted from each diode is partially collimated by an asymmetrical lens and expands with 20° and 4° diverging angles ( θ E ). The full width half maximum (FWHM) of the emitted light is 26 nm for the 808 nm band and 47 nm for the 970 band.
Receiver. The backscattered light from insects entering the overlap between the beam and the receiver's field of view (FoV) is collected by a near infrared coated aspheric lens (60 mm focal length, ø 76.2 mm aperture) onto a silicon quadrant photodiode (QPD) with a total area of 1 cm 2 . The receiver is focused at 1 m and has a 4° divergence angle ( θ R ). Quadrant detection of insects allow for basic range and size estimation 60,61 and can differentiate ascending and descending insects as well as migrating insects with tailwind or host-or scent-seeking insects with headwind. . This allows the two spectral bands to be recorded independently on each quadrant, resulting in an 8-channel data stream. The data is then filtered by a low-pass filter with a cut-off at 5 kHz and digitally sampled to a 20 kHz, 16-bit data stream before it is sent to a microcontroller unit (MCU) for event extraction and further processing (Fig. 3). Since insects generally have wing beat frequencies below 1 kHz, a 5 kHz cutoff allows us to resolve a minimum of five harmonics in the frequency spectra. The increase in bit depth is possible due to the oversampling of the unfiltered signal.

Measurement volume.
The measurement volume is defined by the overlap between the beam and the FoV.
Its size and shape can be adjusted by changing the angle ( θ S ) between the emitter and receiver. The beam, FoV and the measurement volume have been mapped by a custom-built 3-axis robot covering a volume of 2 m × 1.5 m × 1.5 m. The robot is equipped with a photodetector, an illumination source, and a sphere dropping mechanism. The photo-detector and illumination source are used to map the emitted beam and FoV respectively while sphere dropping mechanism allow us to verify the signal intensity from a standard object at any point in the volume. Using these methods, the signal response from an arbitrary target can be estimated. The volumes were measured at 20 planes along the Z axis, from 30 to 1655 mm, each plane consisting of 56 × 56 measurement points in a 12 mm grid. The calculated signals were then compared to actual measurement values by dropping black and white spheres. The white spheres were assumed to be 100% reflective and the black spheres had a 5% reflectivity.
The measurement volume properties for targets with various optical cross sections (OCS) at different angles are shown in Table 1. The size of the measurement volume is dependent on the minimum acceptable sensitivity, which is related to the noise in the instrument. In the following results, the edge of the volume is defined as the limit where the signal to noise ratio (SNR) is larger than 10 for typical noise levels in a field installation. The signal to noise ratio is defined as the maximum value of the recorded signal divided by the peak-to-peak noise. The volumes for a 10 mm 2 target are shown in Fig. 4. Data processing. Automated event extraction. The sensor records intervals of 10 min (4 quadrants, 2 spectral bands, 16 bit and 20 kHz sample rate after demux of carrier frequency) and automatically extracts insect observations from each recording. The event extraction is inspired by earlier work but modified to reduce computational load 42,43,59,62 . The event extraction algorithm was developed during prior experiments in various www.nature.com/scientificreports/ conditions. In simple terms, it aims to quantify the noise level and subsequently multiply it with a signal-to-noise factor to yield a threshold. All events that exceed this threshold are then extracted.
In the chosen implementation, the signal in each channel was downsampled to 2 kHz and a rolling median boxcar filter with a width of 2 s and 50% overlap was used to estimate the quasi-static baselines (the baselines can change with environmental conditions, static objects in the beam etc.). A 2 s window width makes the median estimation insensitive to insect observations, which has an average transit time of ca 100 ms. The standard deviation of the baseline was measured with an identical filter, applied to all datapoints below the median. The selection of values below the median reduces the influence of rare events, such as insects, on the noise level estimation.
The interpolated median signals were removed from the full resolution data and we employed a Boolean condition for insect detection when the time series exceed ten times the estimated standard deviation. A high threshold factor rejected weak observations which could yield unreliable results in the downstream feature extraction. The Boolean time series were eroded by 500 µs and dilated by 30 ms. The erosion rejects short spikes, outliers and insect signals to short to be interpreted and the dilation includes insect observation flanks. The logical OR function was applied across all QPD-quadrants and spectral channels. Extracted observations are transmitted to a cloud database along with metadata such as baseline and noise level, via GSM connection or stored locally until a connection is available. An example of the event extraction process is shown in Fig. 5, and the insect event is shown in greater detail in Fig. 6.
Each insect observation, along with its associated timestamp and device identifier, is automatically uploaded to the cloud via one-way AMQP (Advanced Message Queuing Protocol), with unique connections for each device. Virtual computing is then used to further process, analyze, and securely store data for further use and aggregation.
Feature extraction/data interpretation. The QPD segments collect backscattered light from different sections of the measurement volume. For a single object passing through the measurement volume, the signal strength within each QPD-quadrant is related to the object's OCS as well as its position. As the OCS varies with each wingbeat, the wingbeat frequency can be resolved. Many methods have been used to extract the wingbeat Figure 3. Frequency diagram. The wide beam yields long insect transit times, and the corresponding frequency resolution is high enough to accurately capture most species. The frequency response curve (red) is flat in the wingbeat frequency region and the effect of the LP filter at 5 kHz is indicated. The 5 kHz bandwidth allows a minimum of 4 harmonic overtones to be recorded even for mosquitoes with very high wingbeat frequencies. www.nature.com/scientificreports/  www.nature.com/scientificreports/ frequency from insect observations [62][63][64] and most are based on identifying the fundamental frequency in the frequency domain, as shown in Fig. 6b. In addition to the wingbeat frequency, the body and wing contribution can be measured from each time signal which allows calculation of additional features such as body-to-wing ratio. Additional features can be calculated by comparing the relative intensity of the body and wing signals in the two spectral bands. These bands differentially index melanin absorption [65][66][67] and may yield some sensitivity to wing interference patterns 66,68,69 , although not enough to uniquely determine wing membrane thickness. Together these features can be used to quantify the morphology of different insect groups and allow remote classification of insects according to order, family, genus or species 32,64,69,70 .

Field validation
Methodology. The sensor was field-tested against a conventional insect monitoring method, yellow water traps (22 cm diameter) 33,71 , in an organic oilseed rape (Brassica napus L.) field in the vicinity of Sorø, Denmark (55° 29′ 04.3″ N 11° 29′ 34.6″ E). During a four-week period (04/22/20-05/22/20), insects were monitored with six sensors and six yellow water traps. The water traps were filled with water and soap, immediately drowning any insects landing in the trap. Sensors and traps were placed in a grid pattern, consisting of four linear transects 30 m from and perpendicular to the field's southern-most edge. This is illustrated in Fig. 7. Each transect consisted of three monitoring points (either sensors or traps) with 45 m spacing, and a separation of 22.5 m between transects. The first and third transect consisted of sensors and the second and fourth were yellow water traps. During the field study presented in this work, θ S was set to 20° in order to maximize the signal strength of small targets at close range.
Fundamentally the two methods observe different insect behaviors. While the sensor looks at insects flying above the crop canopy, the yellow water traps look at insects that occur within it. Further confounding the comparison, yellow is attractive to some insects 33 . Therefore, some proportion of insects will be attracted to the yellow water traps, resulting in overrepresentation of some species 72,73 . However, water traps constitute the standard practice for pest monitoring in oilseed rape for many species. Data analysis. The water traps were emptied daily, except for Sundays and 7 additional days (3 sample days in late April and 4 days in mid May) where we were unable to empty the traps. Sensor data was recorded continuously. All insects in the traps were collected, but to allow for a more direct comparison of methods, non-flying insects and thrips found in water traps were excluded from further analysis.
The sensor data was aggregated according to the collection time of the water traps. Insects trapped during Sundays were added to the following days count and the number of collected insects was normalized by the number of trapping days. One day, April 30th, was excluded due to instrument malfunction. The average number of recorded insect observations per sensor per day and per hour was calculated. The calculated numbers were normalized by sensor uptime, which was on average 90% throughout the measurement period. Observations during heavy rainfall and without any distinguishable wingbeat frequency, ca 1% of the observations, were automatically removed from the data using a classification algorithm. www.nature.com/scientificreports/

Results
The insect activity recorded by the sensors and traps respectively are shown in Fig. 8. Insect counts from sensors and traps cannot be directly equated due to differences in measurement subject (insect flights vs insect landings) and non-homogeneous insect distribution; however, they serve to visualize similarities in gross changes in insect activity over the sample period. The results demonstrate a significant correlation between the sensor and trap results, specifically with a Spearman's rank correlation coefficient of 0.61 and a p-value = 0.0065 74 . Over the course of the season, an average of 1122 ± 242 (SE) insect observations per day were collected per sensor (excluding downtime), compared to an average of 63 ± 6 (SE) insects caught per water trap per day over the same period.

Discussion
Here we present a sensor for automated unsupervised field monitoring of insect flight activity. The sensor illuminates an air volume and records the backscattered light from insects that fly through the measurement volume. Discrete insect observations are automatically extracted from the continuous raw data flow and transmitted over a cellular connection to a database in the cloud. Field validation showed the number of recorded insect observations correlates with the number of individual insects trapped by a conventional insect monitoring method. Furthermore, the sensor recorded an order of magnitude more insects than the conventional method over the same period. The automation of insect monitoring has the potential to reduce monitoring bias, cost, and human labor, potentially resulting in an increased ability to collect large quantities of biodiversity, public health, and economically relevant insect data. Additionally, the observations from the sensors were available in real time, whereas emptying and counting insects from traps required a significant amount of labor. While this work was limited to comparing total insect counts from the traps, it is possible for a skilled expert to identify these insects to the sub-species level. This is an area were the traps currently have a strong advantage over this sensor and similar instrumentation. Developing and evaluating species specific insect classification algorithms is therefore a major focus. Significant work is still needed prior to field implementation to test possible use cases and limitations of this system.
One of the most striking differences in monitoring methods is the day-to-day variability in the number of data points collected (Fig. 7). While the yellow traps catch a similar number of insects each day, the difference between low and high flight activity days were more visible in the sensor. Early analysis of the trap and sensor www.nature.com/scientificreports/ data indicates that the peak recorded during May 7-11 is due to a pollen beetle (Brassicogethes aeneus) activity spike. This will be the subject of further studies. Another marked difference between the sensor and the water traps is the number of data points collected over the same collection period. Each sensor observed ~ 19× more insect observations than insects collected in the water trap. While in general the correlation between the two values is considered more relevant than the absolute number, one advantage of a much higher observation is the ability to get statistically sound data aggregated with very high temporal resolution. In this work, the data was aggregated to match the collection times of the traps but it could easily be aggregated down to hourly activity. The higher temporal resolution and continuous monitoring during unsociable hours allows for the comparatively easy and low-labor collection of data on insect circadian rhythms, as well as direct weather interactions.
We hypothesize that the sensors observe different insect behaviors compared to conventional monitoring methods since only airborne (flying or jumping) insects are recorded. Therefore, we did not expect a perfect correlation between the sensors and the conventional methods. Sweep netting is likely the most similar monitoring method since it also catches insects in flight above the crop. However, sweep netting, which also collects insects on plants, occurs at a point measurement in time and is typically performed along a transect, rather than at a fixed point in the field 19 . Also, each trapping method is biased towards different insects, influencing catch 15,17 .
Trapping methods, such as the water traps used in this study, monitor insects landing, walking, or jumping to a specific point and do not record insects in flight. Also, each trapping method is biased towards different insects, with the trap color influencing the trap catch 33 . It would therefore be beneficial to include multiple trap types in the ground truthing in future work. Additionally, both the sensors and the conventional ground truthing methods assume that the recorded insect activity in one specific point in the field is representative of the insect activity in the near surroundings.
Although we do not fully understand in what manner, the sensor is also most likely biased towards reporting certain species groups. Most primarily, its only capable of recording airborne insects and unsuitable for monitoring during rain. Insect vision is focused towards the visual or ultraviolet spectrum and not capable of resolving infrared light and we believe the emitted beam has very little influence on insect behavior 75 . However, in a homogeneous landscape such as an agricultural field, any foreign object placed above the canopy could serve as an attractant to insects. Finally, the size of the measurement volume varies with the OCS of the insects and larger insects will be over-represented. To provide a complete picture of the insect population, this should be considered. Along with species specific observations, this is an area where we expect significant progress.
Automated insect monitoring has the potential to facilitate pest prevention, public health studies and biodiversity monitoring. Compared to alternative methods, such as automated traps, the sensor described in this paper comes at a higher cost per unit and with higher power requirements. Compared to previously described entomological lidars, we record fewer observations, but with a longer transit time and higher sampling frequency. We believe the advantage of entomological lidars, such as the sensor described in this paper, is the ability to potentially monitor and discriminate between multiple species using a single instrument.
In further work we will explore the possibilities of unsupervised long-term monitoring of insect activity and species recognition. www.nature.com/scientificreports/

Conclusions
In this work, we have introduced an unsupervised automated sensor for insect monitoring. The measurement principle is similar to entomological lidar setups but is optimized for near-field measurements. This simplifies the installation process and increases the robustness of the sensor, allowing it to be operable by non-technical experts and enables long-term unsupervised monitoring. The sensor automatically extracts insect events from the raw data and transmits these via a built-in modem for further processing. From the recorded observations, features such as the wingbeat frequency, body-wing ratio, and melanisation factor are computed and used to predict the insect classification down to species. During a 4-week deployment in an oilseed rape field, the detected flight activity was shown to be correlated with a conventional monitoring method.
The capabilities, standardization, and scalability of this sensor-based method has the potential to improve the state of the art in insect monitoring. To date, 119 similar units have been deployed in field and in 2021 the cloud database encompassed > 18 million insect observations. The sensor can be used to explore areas such as biodiversity assessment, insecticide resistance, and long-term monitoring of remote areas, facilitating research studies currently difficult or impossible to conduct with conventional methods.